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1 . Introduction 

The traditional approach to the analysis of data available 
from experiments with X-ray emulsion chambers consists in 
considering one-dimensional distributions or the dependences 
of one experimentally observed value on the other. 

Recently the analysis of two- and three-dimensional dis- 
tributions /I/ as well as the presentation of averages of 
two variables together with their errors led to the possibi- 
lity of drawing a conclusion on the scaling violation in the 
secondary particle fragmentation region at the energies 
'\. 10 16 eV and estimating the degree of its violation /2/. 

Thus, the increase of simultaneously analyzed features 
seems to be attractive, but it is apparent that the analysis 
of three and more features is connected with the necessity 
to have the quantitative measure of distinction of multidi- 
mensional distributions presented by limited samples. 

The other main problem of cosmic ray physics may be formu- 
lated as the problem of determining the portion of experimen- 
tal events belonging to one of described types. That is the 
problem of determining the portion of photon-hadron (^ - h ) 
families generated by various primary nuclei or the determi- 
nation of the primary component chemical composition by the 
EAS data. 

In the present paper we shall show that the solution of 
the above problems may be realized in the multidimensional 
space by the nonparametric statistic methods developed in 
/3»5/. Note that the use of these methods for processing the 
experimental data from the cosmic ray physics installations 
has demonstrated their advantage over the traditionally ap- 
plied techiques /4,6,7/. 

This is a methodical work, i.e. the experimental events are 
replaced by the model ones. Thus we have an opportunity to 
determine the limits of applicability of the methods suggest- 
ed and to estimate the expected accuracies of determining the 
desired physical values. 

Among the models used, there is a pure-scaling one - M6, 
the models with increasing cross section and scaling violation 
in the pionization region - M4, Femin, Eemax (the detailed 
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description of the models may be found in /8/). Some models 
are obviously nonrealistic (e.g. M6), but for the methodi- 
cal purposes the use of the family banks corresponding to 
these models is admissible, 

2. The Distinction of Strong Interaction Models 

The selection of the feature set optimal for the discri- 
mination purposes is performed with account of the averages 
differences and the correlation information. The feature 
pairs with the statistically significant difference in cor- 
relations are included in the set. Finally each set is 
characterized by the so-called Bayesian risk R - the pro- 
bability to misclassify the models (or the model and expe- 
rimental data) in classification procedure performed with 
the optimal Bayesian decision rule (for details of the 
method and used features see /9/)« 

The R 5 use in one-dimensional analysis leads te the same 
conclusions as the standard statistical methods of two 
samples averages difference significance calculation (T-test 
and Wilcoxon test). The more is the difference* between R 
and 0.5 (corresponding to the classes total overlapping) the 
stronger is the difference between the distributions. 

The Bayesian risk calculation method /10/ allows to ob- 
tain unbiased effective estimates and to judge of the model 
describing the experiment in the best way by the successive 
comparison of the experimental and alternative model train- 
ing samples. 

The estimation accuracy depends on the sizes of the used 
training samples. Besides there is an interrelation between 
the sample size and maximum dimensionality of the space 
where one may realize the effective local reconstruction of 
the probability multidimensional density. As we see from 
Table 1, the samples limitation (100+400) leads to that the 
addition of low- informative features may even deteriorate 
the discrimination due to the scarcity of points in N-dimen- 
sional feature space. 


Table 1 • The comparison of the M4 and M6 models “by means of 
j -family various characteristics. 


Combination 

Space dimen- 
sionality 

•' r b 


1 

0.38+0.3 

ER* 

2 

0.37+0.3 

R* £ Jr 

3 

0.35+0.3 

&v 

4 

0.34+0.3 


10 

i * * • 

0.38+0.3 


15 

0.40+0.3 


The comparison of the M4 and M6 models by means of vari- 
ous feature combinations has shown that the addition of in- 
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formation from the hadronic block or the shower installa- 
tion to the ^ -family information does not reduce the 
classification errors. That is, in the problems related to 
the study of strong interaction cross sections and scaling 
violation in the pionization region it is enough to analyze 
only the photon family characteristics. 

3. Separation of Families from the Light and Heavy Nuclei. 

Determination of the Portion of Families from Fe Nuclei. 

Let us take as prototypes (training samples) the 
samples containing the events from the light nuclei and, 
respectively, the events from the banks Femin and Femax. 
Various combinations of both types from the events not in- 
cluded in the training samples will be taken as the "ex- 
perimental" one. Such a choice imitates the case of pre- 
cise knowledge of the strong interaction model. The portion 
of "iron" events in these samples is set to be P H «= 0.05, 
0.07, 0.1. 

The estimate of the portion of families initiated by the 
Fe nuclei will be obtained after the experimental data 
classification and calculation of the probabilities to mis- 
classify the events of both types. 

R - R h ~ R u -*■ h 

* H 1 - Rh^l'Ru-h 

where P H is the portion of families referred to the "iron" 
type,P L -. H and Rh-l are the probabilities of the classi- 
fication possible errors. 

Table 2 shows that the reconstructed P H value is rather 
close to the true one. Besides, it may be shown that the 
classification allows one to enrich 5+7 times the selected 
events with the families from the heavy nuclei, this pos- 
sibly enabling one to study Fe - N interaction at the 
energies more than 10 16 eV. 

Table 2. Reconstruction of the portion of families from the 
Fe nuclei. The training and control samples are 
taken from the banks M6 and Femin. 


Installation 

Features 

Ph 

* X 

CL 

A 

Ph 

^ -block 


R x - °4 

0.05 

0.07 

0.1 

0.194 

0.202 

0.217 

0.042+0.060 

0.055+0.058 

0.080+0.058 

ft -block 
+ shower part 


1 R X > °^ft' 

0.05 

0.07 

0.1 

0.09 

0.115 

0.138 

0.048+0.025 

0.074+0.024 

0.098+0.023 

ft - K -block 

Rft . 

•H, 

0.05 

0.07 

0.1 

0.125 

0.149 

0.171 

0.044+0.028 

0.076+0.026 

0.099+0.025 
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However, if the strong interaction model is unknown 
(this case was simulated by the use of "experimental" data 
from the banks not coinciding with the prototypes), the 
reconstruction is carried out with great errors. Therefore, 
to treat the real experimental data one should use either 
more realistic models or combinations of features weakly 
dependent on the strong interaction model, but simulta- 
neously highly sensitive to the primary nucleus type. 

4. Conclusion 

The use of nonparametric statistic methods allows one 
to carry out the quantitative comparison of the model and 
experimental data. The same methods enable one to select 
the events initiated by the heavy nuclei and to determine 
the portion of the corresponding events. For this purpose 
it is necessary to have the banks of artificial events 
describing the experiment sufficiently well. At present, 
the model with the small scaling violation in the fragmen- 
tation region /II/ is the closest to the experiments. 
Therefore, the treatment of y -families obtained in "Pa- 
mir” experiment is being carried out at present with the 
application of these models. 
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